A High Performance Non-Blocking Checkpointing/Recovery Algorithm For Ring Networks

نویسندگان

  • Bidyut Gupta
  • Namdar Mogharreban
  • Shahram Rahimi
  • A. Vemuri
چکیده

In this paper, we have proposed a new checkpointing / recovery algorithm for ring network architecture. The checkpointing algorithm produces a consistent set of checkpoints in a uni-directional network with the help of few control messages and also avoids the overhead of taking temporary checkpoints unlike most other existing checkpointing algorithms. The number of interrupts to the processes is also less that ensures fast termination of the checkpointing algorithm as well as the application program. The main features of the recovery algorithm are that it is a single step, non-blocking algorithm, with very few interrupts to the processes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of High Performance Distributed Snapshot/Recovery Algorithms for Ring Networks

In thiswork,we have presented non-blocking checkpointing and recovery algorithms for bidirectional networks. We have deviated from the conventional approach of taking first temporary checkpoints and then converting them to permanent ones by processes (as followed by any coordinated checkpointing scheme). Thus, the proposed coordinated checkpointing algorithm allows processes to take permanent c...

متن کامل

Blocking and Non-blocking Checkpointing and Rollback Recovery for Networks-on-Chip

In this paper we propose a dynamically reconfigurable failure recovery scheme developed for Network-on-Chip (NoC) based systems. The recovery scheme is based on a checkpointing and rollback protocol and permits enhancing the system fault tolerance capabilities by exploiting information on traffic load and failure rate. The increased performance of the fault tolerance mechanism is achieved by si...

متن کامل

A User-triggered Checkpointing Library for Computationintensive Applications

We propose a method to incorporate coordinated checkpointing and rollback in high performance computing applications on massively parallel computers. A library allows the user to specify which data-items (including files) belong to the contents of the checkpoint, and to trigger the checkpointing in the application. The recovery-line management on the distributed disk system takes care of which ...

متن کامل

An Enhanced MSS-based checkpointing Scheme for Mobile Computing Environment

Mobile computing systems are made up of different components among which Mobile Support Stations (MSSs) play a key role. This paper proposes an efficient MSS-based non-blocking coordinated checkpointing scheme for mobile computing environment. In the scheme suggested nearly all aspects of checkpointing and their related overheads are forwarded to the MSSs and as a result the workload of Mobile ...

متن کامل

An Application-Transparent, Platform-Independent Approach to Rollback-Recovery for Mobile Agent Systems

This paper proposes a new approach to rollback-recovery for mobile-agent systems, and describes its implementation in the MESSENGERS mobile agents system. The used checkpointing method allows to implement space and time efficient, user-transparent rollback-recovery in heterogeneous distributed environments. Together with an efficient non-blocking system snapshot algorithm this checkpointing met...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006